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(57) ABSTRACT 

Methods and apparatus are described for selecting operation 
devices or hardware components for a processor, such as an 
embedded processor having pipelined data paths. The pro- 
cess may include identifying a set of hardware components, 
such as function units, and a plurality of characteristics for 
those hardware components. TV first- set' of characteristics 
may^include.the ability to add^ subtract, multiply, and the 
liker^or they may~be„multi-functional. A second set of 
characteristics for "the hardware components* may " include 
cbstr throughput and~the^ike73A^ of these charac- 

teristics of the hardware components are incorporated into 
^anjalgorithm, which is then so Ived'foTone bFmore desired 
parameters, such~as'type-and-number_of. hard ware compo- 
nents. In one preferred embodiment,- the least cost assembly 
of hardware components is found for carrying out a set of 
computatiqiis defined by an algorithm to be executed on the 
processor-according to-a-preferfedrinitiatioii iiiterval^ 

21 Claims, 5 Drawing Sheets 
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Sample routine: 



Loop for i=1,N, 
Loop for j=1,N2. 




A=A+1 
B=2*B M- 
C=A-B ^ 



end 
end 
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FUNCTION UNIT ALLOCATION IN function units, register files and bus segments, the job of 

PROCESSOR DESIGN selecting the ftindion units, register files and bus segments 

is very difficult, to say the least. The task is made more 
RELATED APPLICATION DATA difficult if one desires to find the optimal configuration of 

S these units having the lowest or a low cost relative to other 
This patent application is related to the following configurations. Because of the huge number of possible 
co-pending U.S. Patent applications, commonly assigned configurations, it is difficult if not impossible to find optimal 
and filed concunently with this application: configurations. 

U.S. patent application Scr. No. 09/378,298, entitled Pipelined data paths are particularly useful in processing 
PROGRAMMATIC SYNTHESIS OF PROCESSOR jq iterative instructions, such as those found in instruction 
ELEMENTARRAYS, by Robert S.Schr6ib6r,Bantwal loops, and especially nested instruction loops. When con- 
Ramakrishna Rau, Shail Adilya Gupta, Vmod Kumar sidering a subset of situations where the instruction loops are 
Kathail and Sadun Anik' known, such as those used with embedded processors, the 

„ „ . ' „ nnra-Toonc *mj task of designing the Optimal, low-cost processor Still cxists 

fNrJ^'pm^^'p^'^^ . "^'^'^'^ of the large number of dififefent function units, 

L^^^E^r^P^n^u^'^'^J^T PROCES- 15 ,,gi3t,, configurations and bus configurations that are pos- 
SOR DESIGN, by Robert S. Schreiber. ^^^^ Heuristic software design solutions used for designing 

These patent apphcations are hereby mcorporated by processors are not suitable for finding soluuons to such 
reference. multi-dimensional problems. Because there are so many 

variables to consider, it is too difficult to optimize all of 
FIELD OF THE INVENTIONS 20 variables to arrive at a suitable soluUon without great 

These inventions relate to processor design, and more expenditure of time and effort, 
specifically to allocation of function units in processor SUMMARY 
design, such as for systolic processors and application The present inventions provide methods and apparatus for 
specific mtegrated processors (ASIPs). ^0^^ ^^^^y efficiently producing computing systems, 

for example those incorporating processor arrays having 
processor elements with function or execution units, register 



RELATED ART 



Processor design is a very time intensive and expensive fi^^s, busses, and the like. These methods and apparatus 

process. For new and unique processor designs, no auto- reduce the time required for designing these processors, and 

mated design techniques exist for selecting and designing 30 ^^^^'^^ ^ amount of trial and error used in processor 

the mix of processor components that would be incorporated ^^^^g^* ^^^^ ^ ^^"^^ combmaUon of hardware 

into the final processor design. While there exist algorithms components as tested by one or more quantifiable 

, J' ^ 1 ^ parameters, than those developed usmg heunstic methods, 

mcorporated mto software packages that can help m design- f j- . n • u -m. *t. j j 

^ . 1 J ,- leading to overall supenor results. The methods and app a- 

mg new processo^, such software packages do not give a ^^^^^ ^ .^^ ^^^^^^^ ^^^^^^ ^^.^ .n^^^^f^^ 

result which is a final design, let alone an optimal design. 35 ^^^^ 1,^31 ^ ^^^^^ i^t for allocating function 
lypicaUy, those software packages provide approximate ^^^^ processor design. Hiey also eliminate the costs 
solutions to a design problem, typically leading to additional associated with starting the design of a new processor from 
design effort and over-design to account for the lack of scratch, which often may be necessary in the design of 
precision in those software packages. Additionally, the embedded processors, and they allow the more time inten- 
V ijNT^ design process may start entirely from scratch, which would give design process to start later in the conventional proces- 
result in substantial time being consumed analyzing possible ^or design flow. 
V ^^^^^^ configurations before designing the details of the -j^ese and other aspects of the present inventions are 
^ S;rC* processor. On the other hand, designing a new processor provided by methods and apparatus for selecting operation 
JT ^ ^^^"^ preexisting designs necessarily incorporates the devices or hardware components for a processor, such as for 
rj^ ^ V- ^^^^^ benefits and flaws of the preexisting design, which 45 ^ embedded processor having pipelined data paths. The-" 
^ . °^ acceptable or optimal for the new design. hardware^^components-could-be fuDaiS Tuni^ ,' regiSt^r; ' 
^ A* ^AlLconventional processor design software packages are ^ : busses, or othemtefc-that"c6uld a ' 
\,fy. (^y^ /heuristi^ in nature. In other words, they rely on design ^ processor. iTCe' apparatus can take any number of forms, 
^.y^^ criteria and/or methods that in the past have proven more including-^computers and other processors, such as 
^ effective than other criteria or methods. However, in order to 50 mainframes, workstations, and the like, as well as apparatus 
apply to more than one processor design or design containing instructions or data for use in controlling such 
methodology, such design criteria and methods must be processors, such as disk drives, removable storage media, 
sufficiently general to provide predictable results. Therefore, and temporary storage.^ In one aspect- of the pre s'ent 
such heuristic software packages provide relatively high- inv'entions, the process includes identifying a set of hard- 
level solutions without a complete contribution to details of 55 ware Gomponents,'^such as function units,.and a plurality of^ / 
the design. AdditionaUy, heuristic software packages neces- characteristics for those hardware components. A first set of' 
sarily lead to significant trial and error in an attempt to characteristics may include a repertoire, such as the ability 
optimize the processor design. Consequently, design of new to add, subtract, multiply, and the like, and a second set of 
processors is time intensive and expensive. characteristics for the hardware components may include the 
Processors are often designed to incorporate pipelined 60 number of cycles used for a given operation for the particu- 
data paths to speed processing throughput, reduce initiation lar hardware component (data interval, i.e. the number of 
intervals and to optimize use of the various function units, time slots or cycles in each unit of type i for an operation of 
such as adders, multipliers, comparators, dividers and the type j), cost and the like. A plurality of these characteristics 
like. These data paths are formed from an interconnected of the hardware components arc incorporated into an 
assembly of function units and register files. The function 65 algorithm, which is then solved for one or more desired 
units and register files may be interconnected by busses. parameters, such as type and number of hardware compo- 
Be cause these data paths may include a large number of nents. 



10/06/2003, EAST Version: 1.04.0000 



us 6,460,173 Bl 

3 4 

In one preferred embodimeni, the algorithm is an integer such as an embedded systolic processor, including apparatus 

linear program which is loaded with a list of the set of for receiving input to and delivering results from the selec- 

function units available for incorporation into an embedded tion process. 

processor, the cost of each of the function units, the daU rg. 2 U a schematic diagram of a processor such as a 

mtervals associated with the ftincUon units, as well as any pipelined processor that may be designed in accordance with 

other necessary data^associated with the tunclion units. lUe ^^j^^ apparatus and methods of the present inventions. 

integer linear program is also preferably given one or more ^ 

constraints, such as the number of each of the function units " PIG. 3 is a schemaUc of a sample software rouUne 

must be an integ*ef!^Other constraints include the require- segment that may be executed in a processor designed using 

mcnt that thcsumoverallof the function units of the number ^® designs and methods of the present inventions. 

of cycles carried out in a function unit for its operations FIG. 4 is a schematic of a machine instruction level 

divided by the data intervals is at least the number of sample routine derived from the software routine of FIG. 3. 

operations of a given type: The integer linear program may 5 ^ flow chart depicting a process for selecting 

also be given bounds such as a maximum cost for the result, . , . r ■ j 

ai^^ xj^ ,.11 L/uuiivi^^v,u a wo i i , hardware components for a processor m accordance with 

or other constramts based on mput from a user or designer. ,o«o^f «f tko «™o««f ;«™r,t;^«c 

Integer linear programs can quickly and efBciently provide 15 ont aspect of the present mventions. 

a desired solution or an approximation to a solution. Where PIG- ^ is a detailed flow chart depicting one process for 

the result is a desired solution, the result can be used to selecting hardware components for a processor using an 

design a complete processor and provide sulBcient informa- integer linear program. 

tion to produce a hardware description expressed in a FIG. 7 is a flow chart depicting a process for evaluating 

hardware design language such as VHDL. Where the result 20 the results derived from an integer linear program to see if 

is an approximation to a solution, ithe designer can use that ; those results represent a complete solution to a function unit 
result to more quickly design a processor according to the ^ allocation problem, 
defined design criteria for the processor. ^ 

Integer linear programs are particularly useful for provid- DETAILED DESCRIPTION 

ing solutions to multi-dimensional problems, and are par- 25 

ticularly appropriate for allocating function units in proces- The inventions, some of which are summarized above, 

sor elements of synchronous processor arrays. The possible and defined by the enumerated claims may be better under- 

combinations of function units arc so numerous that optimal stood by referring to the following detailed description, 

selection of function units for a particular processor design which should be read in conjunction with the accompanying 

^may be^effectively-jinpossible: This is especially the case drawings. This detailed description of a particular preferred 

with multi- function function units, where because of a embodiment, set out below to enable one to build and use 

particular combination of operations a multi-function func- one particular implementation of the invention, is not 

tion unit would serve better than a number of discrete intended to Umit the enumerated claims, but to serve as a 

function units carrying out the same operations. Algorithms particular example thereof. The particular examples set out 

used to solve an integer linear program are well known and ^^low are the preferred specific implementations of the 

produce reliable results. ^ , _ function unit allocation system that can be used for a number 

^ In^a further preferred form of the invention, the integer applications and implemented in a number of ways. The 

linear program is configured to minimize cost of the function inventions, however, may also be applied to other systems as 

units while still ensurmg that all operations for a given set ' ^^^i 

^ of instructions within a loop are included. Using an integer ' , . , , _ . 

linear program to minimize cost is especiaUy attractive for 40 In accordance with several aspects of the present 

allocating function units in an embedded processor using inventions, apparatus and methods are disclosed for selea- 

data pipehning. Embedded processors will be carrying out ing hardware components for a processor such as an embed- 

predefined operations, many of which will be repetitive loop <ied processor which decreases the time and effort required 

operations. Consequently, there is not as much flexibility for processor design. The methods and apparatus also reduce 

necessary in the function units to accommodate a variety of 45 the amount of trial and error used during processor design, 
different operations. Thus, the focus in solving the integer ^ and produces a more predictable and definitive result than 

^pnear program may be on minimizing the cost of the generic heuristic methods. The methods and apparatus also 

allocation of function units knowing .that a number of ^-^provide a starting point for final allocation of function units 

_ difi^Tent^funGtipn unit combinations are available for carry- in embedded processors, especially those using pipelined 

--ing-out any.given set of operations. While parameters other ^ata flow, and may even produce acceptable designs without 

than cost can be optimized, minimizing cost for embedded any need for additional design work for allocating function 

processors is particularly attractive in view of the expected ^^^^^ ^ven if further design work is desired, the result from 

proliferation of embedded processors in equipment, appli- jhe apparatus and methods described herein provide a desir- 

ances and other apparatus. ^^^^ ^^^^^^ p^^^ ^^^^1^^^ ^^^-^^ ^^^^ ConsequenUy, 

In another form of the inventions, an upper bound on the ^^^^ ^^^^^^ apparatus significantly reduce the cost of 

cost may be input to the integer hnear program. Such ^^^^ ^^^^ ^^^^^ apparatus also can be used to 

additiona mput or constramt effectively Imiits the number ^^^^^ embedded processor design that has the least 

of possible solutions to the integer linear program. If a given *f *• 

„ . • 1 * J L i- 1 * r cost for carrying out a defined set of operations, 

constraint would be violated by a particular set of c = , 

combiiLations,Jhe, integer-linear program can easily elimi- ^'^^ ^™ ^ ^^^^^^ represent a measurable 

hate suc h combi nations without otherwise testing whether or 60 ^quantity corresponding to the hardware component. Jn the 

not the particular set of combinations meet all other criteria. , ^^^^ ^^^^ic sense, it means the cost of inserting the hardware 

Other constraints can also be imp^d'on the integer Uhear component into the processor, including the number of 

program based on heuristics known to the designer. switches and the layout for each component. In a more 

general sense, it may also mean the cost in power consump- 

BRIEF DESCRIPTION OF THE DRAWINGS ^5 ^j^^ng operation of the processor. Alternatively, the cost 

FIG. 1 is a schematic and assembly drawing of an may be in terms of the amount real estate or chip area 

apparatus for selecting hardware components for a processor occupied by the component. Similarly, the cost could be a 
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combination of these and/or other quantities, applied with or which the operaiions would be carried out on various 

without desired weights. Therefore, cost refers to one or function units. In itiat io n In terval may take into account the 

more quantifiable attributes of the hardware components. cycle time"Tcquired by the equipment or appliance into 

In one aspect of the present inventions, the methods can which the computing unit will be placed. For example, the 

be carried out on a pre-programmed digital processor, a s Initiation Interval may depend at„a„macrorlcycl on the 
general-purpose computer or a workstation, such as at 20 ^number of pages to be scanned per second in a scanner, Ihe^ 

(FIG, 1), which can receive input from a conventional number-of sheets to be output per minute by a printer or- 

keyboard 22. The input can lake the form of input data for copier, and the like. It also may depend on a micro level on 

use by algorithms processed by the computer 20, constraints such factors as recurrences and the like. Using this 

to be placed on the processing of the algorithm by the jq information, the mix of function units will be determined, in 

computer, or other input or information as necessary. Infor- accordance with one aspect of the present inventions. If a 

mation such as intermediate results, queries, requests for suitably small initiation interval can be selected, a combi- 

input, final results or other informatioo can be displayed by nation of function units can be selected that will carry out the 

the computer on a monitor 24 or output, as desired. operations defined by the software routine in a sufficiently 

The computer 20 can receive applications programs and/ 55 small amount of time, and the selected function imits are 

or data from a number of different sources, including a ■ efficient and sufficiently low-cost to reduce the cost of the / 

removable storage device such as disk 26 for a disk drive 27 ^ overall processor. One preferred approach for doing so is^ 

or any other movable tangible storage medium, such as described herein. 

portable disk drives and the like. Applications programs and — By. way of example, a simple operation will be described 

data can also be received from a network host, host proccs- 20 to demonstrate the process for using the initiation interval of 

sor or mainframe computer 28, or from more remote loca- a sample routine of FIG. 3 to^starl designing part of the 

tions such as off-site servers, over the Internet, or through processor. Initiation interval is the number of cycles between 

other conventional communications paths. For example, -the start of one operation of a loop iteration and the start of 

data can be received from a satellite antenna 30 through a ihe-next iteration of loop for the same operation, in other 

transmitter receiver 32 Hnked to an input and output port on 25 words the number of cycles between successive iterations of 

the computer 20. Other linkages and communications meth- the loop. The sample routine can be converted into an 

ods can be used with the computer 20 in order to receive data equivalent sample routine depicted as 52 in FIG. 4. The 

and software, and to transmit data, results and software. instruction level sample routine 52 identifies the cycle times 

Particular applications to which the present inventions are over which the operations of the inner loop nest 48 could be 

directed include designing processors such as embedded 30 carried out on possible function imits, such as an adder 54 

processors. Embedded processors are used extensively as and a multiplier 56, Other combinations of function units are 

controllers or processors for equipment, appliances, enter- , possible, but this combination serves as a simple example, 

tainment devices, and the like. These processors have pre- Analyzing this instruction level sample routine, for example, 

^defined functions and operations, and many of the opera- may lead to a selection of an initiation interval of 11-4, 

^ tions are repetitive. These repetitive operations, by their 35 While it should be understood that other initiation intervals 

nature, lend themselves"to~being carried out on function can be selected for the sample routine 40 and the instruction 
units ^depicted schematically^t_34„(FIG..2)„using.register-— level sample routine 52, this initiation interval wUl be used 

files 36 all arranged in such a way as to move incoming data as a starting point for this example, 
and transfer the results of operations along a path termed a Given an instmction level sample routine 52 and an 

"pipeline". The pipeline arrangement takes advantage of the 40 initiation interval II, there are numerous function unit com- 

naturalflowof data through the operations while minimizing ' binations that could be used to execute the operations 

register load requests and data transfers. Buses 38 may be defined by the sample routine 40 and instruction level 

used to interconnect the function units 34 and the register sample routine 52 and which would have the initiation 

files 36. The present inventions can be used to quickly and interval II, It should be understood that while many function 

inexpensively optimize the design of such embedded pro- 45 unit combinations could be used in the end processor, it is 

cessors. While the following discussion will focus on also valuable to optimize one or more parameters about the 

embedded processors and their design, with particular computing unit,^such as.findjing1hejcast cost combmationo 

emphasis on assemblies of function units, register files and function tinits that could be incorporated in the pjocessor 

buses configured to optimize data pipelining, it should be being desired. In accordance with one aspect of the present 

understood that the inventions described herein may be 50 inventions, an integer linear program 58 (FIG. 1) is used.to 

applicable to the design and manufacture of other processor find a solution to the optimization problem of identifying the ' 

configurations. desired, function units having the desired characteristics to 

Processors operate based on instructions from a software execute the instruction level sample routine in the desired 7 

program or other instruction source. Part of that software ma^er."^If~can~be used for selecting function units to be 

program or other instruction source may include a loop body 55 incorporated into the desired processor, and it can do so 

or loop nest represented in FIG. 3 by the sample routine 40. based on a number of criteria, including characteristics about 

This sample routine would be part of a larger software the function units,.such as cost,jhrough put, and the like, 

program or other instruction source, but serves in this The integer linear prograni can be used to for cv aluatin g.the 

instance as a suitable example of a set of calculations that hardware components and to select which ones best.suit the 

can be carried out on an embedded processor, such as one 60 purpose at hand. There are a number of algorithms which 

including an ensemble of function units, registeijfiles^and^ can-^be^used to solve integer linear programs, and in one 

possibly buses fqrniing^a.pipelincdLdata paUijThc sample/ preferred embodiment, the routine knpwn^CPLEX, mar- 

routine maylnclude an add operation 42, a multiply opera- keted by ILOG, can be us ed to do so. The function units,are^ : 

tion 44 and an add operation 46 within a loop 48, which in ^ ^adders, multipliers; arithmetic logiVjunits, registers, busses, 

turn is nested within a loop 50. This loop nest can be 65 or the like, and their desired characteristics arerlowest cost' 7 

evaluated and considered along with other fequiremeats to^ vand data intervals that will allow the resource-based niini- 

^determine possible Initiation Intervals (II) according to murninitiationintervalto be less than or "equal to the defined 
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initiation interval, n=4 in the given example. This informa- 
tion along with various criteria are then recast into an integer 
linear program which can be solved by the desired algo- 
rithm. 

In one preferred embodiment of the inventions, a general- s 
purpose computer, workstation or other processor 20 (FIG. 
1) may accept as input an instruction program and parameter 
set 60 (FIG. 5). This information may be input from a 
keyboard 22, a removable storage medium 26, from a 
mainframe 28 or from a communications link. The instruc- 
tion program and parameter set may include the initiation 
interval or a similar criterion, and a library of information 
about the function units or other components to be selected 
for inclusion in the final processor. The processor 20 will 
also include, already loaded or retrieved from another 
source, an algorithm, such as CPLEX, or the like, for solving 
an integer linear program. The processor 20 then creates and 
solves the integer linear program in step 62 using the 
instruction program and parameter set provided from step 
60. The solution to the integer linear program is output at 20 
step 64 to memory, to the removable storage medium 26, the 
mainframe 28 or to a receiver over a communications hnk. 
^ — Where the integer linear program is used to solve for the , 
Jeast cost set of suitable function units, the processor 20 / 
outputs the list or vector of function units whose combinar ^5 
tion fits the criteria input to the integer linear program. ,/ 

A conventional scheduUng program such as a modulo 
scheduler schedules the operations and assigns operations to 
cycles and to the function units based on the instruction level 
sample routine 52 (FIG, 4), as represented at 66 in FIG. 3. 30 
The modulo scheduler can then test the allocation of the 
function units and the operations at step 68 to see whether 
the function units can carry out the required operations with 
an initiation interval less than or equal to the initiation 
interval 11=4, in other words if the actual Ila is equal to the 35 
desired lid. If successful, the processor 20 can design 
register files and allocate operands to register files at step 70, 
and it can also allocate register files and bus structure to 
function units at step 71. These steps at 70 and 71 can be 
carried out using a conventional heuristic algorithm or by 4^ 
using an integer linear program in a similar manner as was 
done for the function units but by treating a bus structure as 
a function unit and a data transfer as an operation. If not 
successful, the function unit set or mix found by the integer 
linear program can be adjusted at step 69 by adding and/or 45 
substituting function units, as more fully described below, 
after which step the modulo scheduler can then repeat step 
66 until the desired mix is found. If the bus and register files 
are determined separately, as in steps 70 and 71, additional 
tests can be done to see if the desired mix is found. If the 50. 
desired mix is not found, the system returns to step 70. If it 
is found, the system continues. 

Thereafter, the processor 20 or another suitable apparatus 
generates a hardware definition language at step 72, such as 
VHDL, and outputs the hardware definition language so that ss 
it can be delivered, at step 74, to a manufacturing site. The 7 
^ manufacturing^ite then createsj^step 76 a suitable product 7 
J^such as an embedde^lBprpcessor 7^^^^ incorporates the^^ 
function units and bus structure,, or Other linkage between^ 
the function units and register files, suitable to execute the 60 
(^instruction level sample routine. ^ , — 

An integer linear program is a preferred algorithm 
becaxise it has known characteristics and is capable of 
solving multidimensional problems. In the present example, 
the multidimensional problem is to select a number of 65 
computation devices or hardware components such as func- 
tion units from among a library of function units, each 



function unit having a first characteristic such as the ability 
to add, subtract, multiply, compare, divide, take the absolute 
value, transfer data, store data, etc. This Ubrary of function 
units is substantial, given the number of muJti-function 
function units, such as arithmetic logic units, and the like 
that are now available. These function units also have a 
second characteristic including cost, and other characteris- 
tics such as data interval and other attributes that may be 
selected or optimized. The number and variety of function 
units is quite large. Therefore, the selection of funaion units 
to be used to execute the operations defined by the instruc- 
tion level sample routine is a oompUcated multi -dimensional 
problem. The integer linear program can be solved by 
conventional software algorithms, the solution of which 
includes the function unit ensemble that provides the desired , 
resource-based minimum initiation interval and hasjhe least 
cost of aU such possible~ens;embIes.''ln addition, the cost of 
the function unit ensemble obtained from the integer linear 
program provides the designer with a useful lower bound on 
the cost of the least expensive function unit ensemble that 
can achieve the desired initiation interval. 

In the preferred embodiment, the register files and the 
operands are designed and allocated as function units in 
steps 64 and 66. Likewise, the bus structure is allocated 
between the register files and the function units m the same 
manner. It is as efficient to combine these steps as part of the 
original integer linear program as to do them separately, or 
without using an integer linear program at all. When com- 
bined into one larger process, the steps at 70 and 71 are 
collapsed into 64 and 66, and the test at 68 is of the complete 
mix of potential hardware components as determined from 
the output of the algorithm. 

In the preferred embodiment, the instruction level sample 
routine will be used to find an initiation interval 11. The 
initiation interval is one of the input values for the integer 
linear program. The other values or parameters that are used 
to create the mixed integer Unear program include the library 
of function units and their known characteristics, which are 
then used to design criteria into the integer linear program, 
as discussed more fully below with respect to the method 
depicted in FIG. 6. 

As showri in FIG, 6, the system generates a mixed integer 
linear program with an initiation interval at step 80 and a 
ftinction unit parameter set 82. These data may come from , 
memory or a suitable input or other conventional source. 
The system, loaded with the algorithm to solve the integer 
Hnear program, also uses other^ parameters or constraints 
such as a cost ceiling provided at 84 and any other con: 
straints which -may come from^empifical information, heu- 
ristics or other knowledge of processor design. For example, 
the c6st"c6iling may come fi-om a general knowledge of costs 
of current embedded processors, or the like. Providing a cost 
ceiling, and even a multiple of a reasonable cost ceiling 
provides useful information for the integer linear program to 
minimize the possibifity that the program follows potential 
solution paths that empirically are uiueasonable. 

The system is also provided with or calculates at 86 Np 
which is the number of function unit types available, as 
determined by the library of function units. may also be 
predefined as one of the parameters associated with the 
library of function units. The system is also provided with or 
calculates N^, which is the number of operation types that 
need to be scheduled, based on the understanding of the 
instruction level sample routine on which the function imits 
will operate. In order to find the least cost assembly of 
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function units, the Cost is minimized at step 88, where Cost 
is deteraiined from the following relationship: 



where: 

N^the number of different types of Function Units avaD- „ 
able; 

X~the cost incurred by using the Function Unit i; 
f,=the number of Function Units of type i needed to carry 
out the operations required by the software steps (an 
integer); 15 
The Cost is minimized using several constraints or inequali- 
ties taken into consideration by the integer linear program. 
The first constraint is the number of cycles in a given 
function unit of type "i" for operation "j". Specifically, 



20 



N^-the number of operation types to be scheduled for ^5 

carrying out the software steps required; 
x,y= number of cycles in function units of type i for 

operations of type j, greater than or equal to zero. 
11= the desired initiation interval for the assembly or 
subassembly of Function Units. 
This first inequality stales that there must be enough time or 
cycles represented by the function imits in the subject 
iteration to carry out all of the desired operations. 

A further constraint is that f, the function unit mix, is an 
integer vector. Another inequality makes the number of 
cycles devoted to a particular operation in the iteration 
greater than or equal to the number of that type of operation 
in the iteration. Specifically, 



30 



35 



Here, 45 
^.j-iht data interval or number of time slots or cycles in 

each unit of type i for an operation of type j; and 
o~the number of operations of type j to be scheduled for 

the iteration. 

It should also be understood that the x's and the f s are 50 
non-negative. It can also be seen that the number of inequah- 
ties is the number of operation types plus the number of 
function imit types. Additionally, the number of unknowns is 
the product of the number of operation types and the number 
of ftinction unit types, namely, the x's, plus the number of 55 
function unit types, namely, the f's. 

It is appreciated that a solution to the algorithm may not 
give a complete solution to the design problem. However, it 
is certain that the solution to the integer linear program using 
these incquaUtics will lead to a resource-based minimum 60 
initiation interval which is less than or equal to the defined 
initiation interval, and it is believed that it will produce a 
function unit ensemble close to a desired solution as well as 
a lower boimd on the least possible cost of the desired 
solution. 65 

These data, parameters and inequalities are then used to 
create the integer linear program at step 92, which is then 



solved using CPLEX or some other algorithm at step .94 to 
produce the ensemble of function units having a least cost 
and a resource-based thinimiLm initiation interval less than 
or equal to the defined initiation interval. The processor 20 
then produces, at step 96, a function xmit list and determines 
a new resource-based minimum initiation interval. An algo- 
rithm such as a modulo scheduler is used to schedule 
operations on the listed function units and determine if the 
function unit allocation produces the desired processor con- 
figuration having the desired initiation interval (step 98). In 
the example where the function units include the registers 
and busses, the hardware description language is generated 
if a successful mix is found. If not, the system branches to 
"A" (FIG. 7), where adjustments can be made to the function 
unit mix. In the example where the function imits do not 
include the registers and busses, branch "A" is taken if the 
function unit allocation mix does not produce the desired 
initiafion interval, and then returns to step 98. If the mix 
produces the desired initiation interval, the system contin- 
ues. 

Register files are allocated (step 99) and a bus structure is 
defined and allocated to the function units and the register 
files in the conventional manner, or in accordance with an 
integer linear program created where the function units are 
buses, at step 100. The mix is then tested at 101, and if 
successful, a hardware description language can then be 
produced at step 102 using any one of a number of conven- 
tional techniques or algorithms. 

If the function unit allocation does not produce the desired 
'processor confi^ratiori at step 98, or in the case of separate . 
register aifid bus definitions^ at step 101, the function unit 
ensemble br mix can be adjusted (step 104, FIG. 7). FIG. 7 
is, a^hematic- representation of types of adjustments that 
could be made if desired. It does not represent any particular ^ 
order for the adjustments or testing. The entry point for 
branch-^^A" can be at any desired location along the chain in 
FIG. 7 and need not follow any particular order. Possible 
entry points include Al, A2 and A3, depending on whether 
the intended process is to add function units ("Al^'), sub- 
stitute function units ("A2") or both ("A3"). Once the mix 
is changed, it is re -evaluated to see if the mix produces the 
desired initiation interval. For example, the system can enter 
at "Al" and add function units (step 106). The new function 
unit mix is then rescheduled or re-assigned with the buses 
and with the instruction level sample routine and the com- 
bination is re-evaluated to see if the new ensemble of 
function units carries out all the required operations in such 
a way that the initiation interval is less than or equal to the 
defined initiation interval. If successful, the hardware 
description language can be produced, as at step 102. If not 
successful, other function units can be substituted in and the 
new ensemble rescheduled and the function units assigned 
according to the instruction level sample routine and 
reevaluated (step 108), If successful, the hardware descrip- 
tion language is produced. If not, additional function units 
can be added and/or others substituted and the new ensemble 
rescheduled and evaluated with the buses and the required 
instruction level sample routine as before (step 110). The 
processor 20 then returns to produce the hardware descrip- 
tion language (112). 

Even if the first solution to the integer linear program is 
not a complete solution, it is believed that it provides a close 
approximation, and saves a substantial amount of time and 
design effort relative to having to start from scratc h 
q L the in teeer^rrlinear' program^ helps solve the^^np'uSTi^* 
dimensioD^ tocd oD problem and speeds , the— ^ 

selectio n ^ of^a ^ low- cost function imit mix that can be used to / 
carry out the required^pejaUoo^^ilS'tGe^esifed'^in " 
iDfterva:r. "^^-^^-^ — 
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c Ttfe mixed integer Ikear^grogiamjl.ecreases the lime and 
eflfori requirey^fofp^Ssor design, and reduces the amount 
of tr.iaUandwcn:or--used in processor design, especially, for 
embedded processors or computing systems. Conventional 

^Igorithms-can bcused'to-solve'the integer linear program 
and their solution is the best or least cost assembly of 
hardware components than can be designed to carry out the 
required program in the desired initiation interval than can 
be found using heuristic methods, it provides-the desired 
low-cost function^unit^jx or a Jsepcficial starting point for 

' detgrmijffing"ifie deS'red ftinclion unit mix. TTfe'^use-of the- 
'iht6ger'^line*^''^fbgram a cost associated with 

' startingtom scratch and provides^ better starting point than 

jwould otherwise exist. 

Having thus described several exemplary implementa- 
tions of the invention, it will be apparent that various 
alterations and modifications can be made without departing 
from the inventions or the concepts discussed herein. Such 
operations and modifications, though not expressly 
described above, are nonetheless intended and implied to be 
within the spirit and scope of the inventions. Accordingly, 
the foregoing description is intended to be illustrative only. 
For example, it should bTundersloodlhat finding a leasfcost 

jGollectioEtjot function units is.one^ preferred goal, and; it -is 
possible to in^le^d*9r»inpadditt^f^|^ value for tbe^ 

^combioatioh . While^in^tliS m'antSfeumTg'Sid^ 

i'^iewed as maximizing the, savings jn production (the con- 

'^verse of cost), but there could also be posiuve^attributes of 
a hardware com ponent t hat one_wishes-to promote in the 
final design. Therefore, optimization in the present invention 
includes maximizing a positive feature as well as minimize 
ing a negative or otherwise less desirable feature. 

APPENDIX 

The following appendix is an unpublished paper provided 
as part of this application, incorporated herein by reference. 

Robert Schreiber, Optmmm Function Unit Allocation via 
Generalized Bin Packing and Mixed Integer Linear Pro- 
gramming. 

1 claim: 

1. A method for selecting hardware components for a 
processor, the method comprising: 

receiving a description of a set of hardware components 
having one or more characteristics; 

receiving at least one of the one or more characteristics for 
use in evaluating a combination of the hardware com- 
ponents; 

creating an integer linear program using the one or more 
characteristics of the hardware components in the set of 
hardware components and constrained by the combi- 
nation of hardware components having a lowest cost 
and predicted not to exceed a maximum initiation 
interval while performing an iteration of a routine; 

solving the integer linear program for selecting the com- 
bination of hardware components to be included in the 55 
processor; 

scheduling instructions on the combination of hardware 
components associated with the routine using a sched- 
uler; and 

modifying the combination of hardware components 60 
when the initiation interval associated with the sched- 
uled instructions exceeds the maximum initiation inter- 
val. 

2. The method of claim 1, further comprising: 
producing a list of hardware components for use in the 65 

processor and identifying the hardware components by 
computation type and data interval. 



3. The method of claim 2, wherein producing a list further 
includes creating a hardware description language corre- 
sponding to the list. 

4. The method of claim 1, wherein receiving a description 
of a set of hardware components includes identifying at least 
two devices from components that can add, multiply, 
compare, and wherein the devices have characteristics that 
include cost, data interval and number of functions. 

5. The method of claim 1, wherein receiving at least one 
of the one or more characteristics includes identifying a cost 
of including a hardware component in a processor and 
identifying a data interval. 

6. The method of claim 5, wherein creating and solving 
further includes minimizing the cost of the hardware com- 
ponents. 

7. The method of claim 6, further including defining 
boundaries on the integer linear program. 

8. The method of claim 7, wherein defining the boundaries 
further includes defining an upper boundary for a total cost 
of the hardware components identified on the list. 

9. The method of claim 1, further comprising creating a 
library with representations of information identifying 
which operations occur on a particular hardware component. 
/ 10. A processor including hardware components wherein 
the components are made according to claim 1. 

11. A method for selecting hardware components for a 
processor, the method comprising: 

receiving a description for a set of hardware components 
having one or more characteristics including an ability 
to add, subtract, multiply, and a cost for each hardware 
component as a function of the one or more character- 
istics; 

receiving the cost as one of the one or more characteristics 
for use in evaluating a combination of the hardware 
components; 

creating an integer linear program using the one or more 
characteristics of the hardware components in the set of 
hardware components and constrained by the combi- 
nation of hardware components having a lowest cost 
and predicted not to exceed a maximum initiation 
interval while performing an iteration of a routine; 

solving the integer linear program to obtain the minimum 
cost assembly of hardware components and select the 
combination of hardware components to be included in 
the processor according to the constraints; and 

producing a list of hardware components for use in the 
processor. 

12. The method of claim 11, wherein receiving a set of 
hardware components includes identifying a set of hardware 
components that include registers and bus elements. 

13. A system for selecting hardware components for use 
in a processor, the system comprising: 

a computation system including a processor and memory 
in which is stored data and a solver for solving an 
integer linear program constrained by a combination of 
hardware components having a lowest cost and pre- 
dicted not to exceed a maximum initiation interval 
while performing an iteration of a routine, wherein the 
processor accesses the memory for executing the solver 
and processing the data to determine a solution to the 
solver, and wherein the combination of hardware com- 
ponents are modified when instructions scheduled on 
the hardware components by a scheduler exceed an 
initiation interval while processing an iteration associ- 
ated with the routine; 

means for accepting input data wherein the input data 
includes a value representing an initiation interval for a 
processor to be designed by the system; and 
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means for producing output data representing hardware 
components to be incorporated into a processor, 

14. The system of claim 13, wherein the means for 
accepting input data includes means for accepting data 
representing hardware components and cost values repre- S 
senting a cost of incorporating each hardware component 
into a processor. 

15. The system of claim 13, wherein the means for 
accepting input data includes means for accepting data 
representing code representing a program language that lO 
would be executed on a processor having the hardware 
components represented by the data produced by the means 
for producing output data representing hardware compo- 
nents. 

16. The system of claim 13, wiaerein the computation 15 
system memory includes a database of hardware component 
characteristics including hardware component cost, hard- 
ware component throughput, and hardware component 
types. 

17. The system of claim 16, wherein the hardware com- 20 
ponent characteristics include characteristics of adders and 
registers, and further includes data representing costs for 
each component characteristic. 

18. A processor made according to the system of claim 13. 

19. A process for generating a description language of a 25 
new processor, the process comprising: 

receiving input in a computing machine including data 
representing a desired performance value of the new 
processor, data representing operations to be performed 
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by a routine on the new processor, and data represent- 
ing hardware components and hardware component 
characteristics that are included in processors; 
receiving a solver in the computing machine for solving 

integer linear program problems; 
receiving data representing criteria for defining an integer 
linear program constrained by a combination of hard- 
ware components having a lowest cost and predicted 
not to exceed a maximum initiation interval while 
performing an iteration of a routine; 
solving the integer linear program using the solver to 
produce the combination of the hardware components 
that are used to build the new processor; and 
modifying the combination of hardware components 
when instructions associated with the routine are sched- 
uled to execute on a subset of hardware components 
and a corresponding initiation interval exceeds the 
maximum initiation interval. 

20. The process of claim 19, wherein receiving data 
representing criteria includes receiving data and a defined 
relationship for finding a least cost assembly of hardware 
components for the new processor, 

21. The process of claim 19, wherein receiving data 
representing hardware components includes receiving data 
for registers and bus structures. 
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